glTF rendering

Some notes on loading and rendering glTF scene files.

glTF loading

If you are using C or C++, the cgltf library is a single-header / single-source C library for reading glTF files.

Top-down vs Bottom-up Loading

After having implemented a fair amount of glTF loading and rendering, it seems there are basically two general strategies to go about this:

Loading the scene top-down, starting from the scenes down to buffers.
Loading the scene bottom-up, from buffers up to scenes.

I started and stayed with the latter, but the former has some benefits too.

The top-down approach gives you context. For example, and as explained in the next section, you will be able to determine how a texture is used based on the material that references it, and consequently, you will know what colour space to expect the texture in. The down-side of the top-down approach is that since resources like textures or buffers can be shared among meshes, you will need to implement some sort of caching to avoid loading the same resource twice. The bottom-up approach has the opposite trade-offs: resources can be loaded in a linear fashion, but resources like textures would have to be loaded lazily until the context is known.

Textures and Colour Spaces

Albedo textures are given in RGB space. Normal maps, metallic/roughness, and other non-colour textures should be in linear space. Make sure to use the right format (RGB vs SRGB in OpenGL). It is also best to load the textures in the right format instead of performing a Linear->sRGB conversion in shader code. [discussion].

Note that the intended use of a texture is not immediately clear when looking at the textures, images and samplers sections of a glTF file. See the DamagedHelmet sample:

"images" : [
    {
        "uri" : "Default_albedo.jpg"
    },
    {
        "uri" : "Default_metalRoughness.jpg"
    },
    {
        "uri" : "Default_emissive.jpg"
    },
    {
        "uri" : "Default_AO.jpg"
    },
    {
        "uri" : "Default_normal.jpg"
    }
],
...
"samplers" : [
    {}
],
...
"textures" : [
  {
      "sampler" : 0,
      "source" : 0
  },
  {
      "sampler" : 0,
      "source" : 1
  },
  {
      "sampler" : 0,
      "source" : 2
  },
  {
      "sampler" : 0,
      "source" : 3
  },
  {
      "sampler" : 0,
      "source" : 4
  }
]

Intead, the use becomes apparent when parsing the materials of a mesh primitive. Below, each index refers to a texture in the snippet above:

"materials" : [
{
    "emissiveFactor" : [
        1.0,
        1.0,
        1.0
    ],
    "emissiveTexture" : {
        "index" : 2
    },
    "name" : "Material_MR",
    "normalTexture" : {
        "index" : 4
    },
    "occlusionTexture" : {
        "index" : 3
    },
    "pbrMetallicRoughness" : {
        "baseColorTexture" : {
            "index" : 0
        },
        "metallicRoughnessTexture" : {
            "index" : 1
        }
    }
}
],

My implementation loads textures lazily. It first scans all textures up front, but instead of uploading them to GPU memory, which requires knowing the format, it instead returns a list of “load commands”. These commands describe how to read the texture (disk or memory) and what parameters to use, with a default format of sRGB. Later, when loading materials, if a normal texture is detected, the relevant load command is patched to use a linear colour space instead. The textures are then uploaded to GPU memory also during material loading.

Tangents

The DamagedHelmet sample has a normal map but no tangent vectors. In such cases, the glTF spec currently says:

“When tangents are not specified, client implementations SHOULD calculate tangents using default MikkTSpace algorithms with the specified vertex positions, normals, and texture coordinates associated with the normal texture.”

However, the MikkTSpace documentation also says:

// Note that the results are returned unindexed. It is possible to generate a new index list
// But averaging/overwriting tangent spaces by using an already existing index list WILL produce INCRORRECT results.
// DO NOT! use an already existing index list.

In other words, if the model has vertex indices like DamagedHelmet, then we should not compute tangents with the model as is. At the very least, we should unindex the model, compute the tangents, and then re-index it. [discussion]

What the glTF Sampler Viewer implementation does instead is to approximate the tangent in screen space when the model has no normals: [source]

vec3 uv_dx = dFdx(vec3(UV, 0.0));
vec3 uv_dy = dFdy(vec3(UV, 0.0));

vec3 t_ = (uv_dy.t * dFdx(v_Position) - uv_dx.t * dFdy(v_Position))
        / (uv_dx.s * uv_dy.t - uv_dy.s * uv_dx.t);

Animation

glTF is very flexible in terms of how animation is specified. The details threw me a bit off.

A glTF “skin” (the “skeleton” in most literature?) contains an array of indices, which point to the joints. Instead of defining a joint data structure, however, glTF re-uses the node data structure to specify joints. So the skin/skeleton’s joint indices are indices into the scene’s nodes. The joint node hierarchy also appears to be relatively detached from the rest of the scene nodes and otherwise plays no role in the scene.

As far as I can tell, the joint/node indices in a skin/skeleton may also point to arbitrary nodes in the scene; they need not be a contiguous sub-array of the joints array. So if you want the joints tightly packed for better caching, you might be at the mercy of the exporter or additional processing.

The skin/skeleton also need not have a common root joint node; the common root may just be a non-joint node in the scene. If you look at joint nodes only, then a skeleton is generally a disjoint set of joint/node hierarchies. The glTF may specify a common root (called the “skeleton” in glTF), but the specification currently makes that optional.

Finally, a glTF animation contains a list of channels, and a channel targets the joint/node that it transforms. But there is no 1-1 mapping between animations and skins/skeletons; an animation can transform various skins/skeletons at once.

In my case, I pack the joints into a separate array of a joint data structure of my own. I also force the hierarchy to have a common root joint by looking for the original roots and parenting them under a new common joint, which gives the skeleton a convenient root node from which it can be traversed.

Quaternion Interpolation

Rotations in a glTF skeleton are given using quaternions. You will have to interpolate them. This is my interpolation function:

/// Interpolate two unit quaternions using spherical linear interpolation.
///
/// Note: You might want to normalize the result.
static inline quat qslerp(quat a, quat b, R t) {
  assert(0.0 <= t);
  assert(t <= 1.0);
  const R eps = 1e-5;
  (void)eps;
  assert(R_eq(qnorm2(a), 1.0, eps));
  assert(R_eq(qnorm2(b), 1.0, eps));
  R dot = qdot(a, b);
  // Make the rotation path follow the "short way", i.e., ensure that:
  // -90 <= angle <= 90
  if (dot < 0.0) {
    dot = -dot;
    b   = qneg(b);
  }
  // For numerical stability, perform linear interpolation when the two
  // quaternions are close to each other.
  R ta, tb;
  if (1.0 - dot > 1e-6) {
    const R theta     = acos(dot);
    const R sin_theta = sqrt(1 - dot * dot);
    ta                = sin(theta * (1.0 - t)) / sin_theta;
    tb                = sin(theta * t) / sin_theta;
  } else { // Linear interpolation.
    ta = 1.0 - t;
    tb = t;
  }
  return qadd(qscale(a, ta), qscale(b, tb));
}

(Functions like qscale and add have trivial implementations to scale and add.)

The implementation is based on the article in Wikipedia:Slerp and relevant chapter in Mathematics for 3D Game Programming and Computer Graphics.

One addition not mentioned in those references is the check:

if (1.0 - dot > 1e-6) {

The idea is that if the two quaternions are close to each other, we fall back to linear interpolation. This is not just for speed, but for numerical stability primarily.

References

glTF Sample Viewer

glTF Sample Viewer code